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Detection of Alzheimer's disease (AD) at the first stages of the pathology is an important task to accelerate 
the development of new therapies and improve treatment. Compared to AD detection, the prediction of AD 
using structural MR1 at the mild cognitive impairment (MCI) or pre-MCI stage is more complex because 
the associated anatomical changes are more subtle. In this study, we analyzed the capability of a recently pro- 
posed method, SNIPE (Scoring by Nonlocal Image Patch Estimator), to predict AD by analyzing entorhinal 
cortex (EC) and hippocampus (HC) scoring over the entire ADNI database (834 scans). Detection (AD vs. 
CN) and prediction (progressive — pMCI vs. stable — sMCI) efficiency of SNIPE were studied using volumetric 
and grading biomarkers. First, our results indicate that grading-based biomarkers are more relevant for pre- 
diction than volume-based biomarkers. Second, we show that HC-based biomarkers are more important than 
EC-based biomarkers for prediction. Third, we demonstrate that the results obtained by SNIPE are similar to 
or better than results obtained in an independent study using HC volume, cortical thickness, and tensor- 
based morphometry, individually and in combination. Fourth, a comparison of new patch-based methods 
shows that the nonlocal redundancy strategy involved in SNIPE obtained similar results to a new local 
sparse-based approach. Finally, we present the first results of patch-based morphometry to illustrate the 
progression of the pathology. 

© 2012 The Authors. Published by Elsevier Inc. All rights reserved. 



1. Introduction 

The diagnosis of Alzheimer's disease (AD) at pre-clinical stages or the 
prediction of conversion of patients with mild cognitive impairment 
(MCI) to AD is a very challenging problem receiving attention because 
of the immense associated social and economic costs. Longitudinal 
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studies have examined cognitive capacities during aging and demon- 
strate that alterations with significant decline occur more than a decade 
prior to clinical diagnosis (Amieva et al., 2008; Elias et al., 2000). Re- 
search from diverse scientific disciplines has focused increasing attention 
on identifying the earliest prodromal signs and risk factors for 
Alzheimer's disease (Ballard et al., 2011). 

Several biomarker candidates have already been studied in depth 
with the goal of achieving this task. For example, the presence of 
amyloid-p (Afi), a hallmark of AD, seems to occur in the very early course 
of the pathology, long before the typical clinical, behavioral, and social 
criteria of dementia are fully met (Frisoni et al., 2010). Ap presence can 
be studied using cerebrospinal fluid (CSF) markers or positron emission 
tomography (PET). Generally speaking, the results found are heteroge- 
neous, and therefore, links between A[i burden and cognitive deficits 
are still unknown (Aizenstein et al., 2008; Chetelat et al., 2010; 
Kantarci et al, 2012; Villemagne et al, 2011). By contrast, biomarkers 
based on anatomical magnetic resonance imaging (MRI) are increasingly 
under investigation because they are considered more sensitive to 
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pathology evolution in the pre-demential stage of AD (Frisoni et al., 
2010). Usually, these imaging biomarkers are used to detect abnormal 
patterns of atrophy caused by AD on key structures in the brain; such 
patterns are considered the macroscopic signs of microscopic alterations. 

The structures in the medial temporal lobe (MTL) are being studied 
especially intensively because of their strong involvement in the patho- 
genesis of AD (Braak and Braak, 1991). Recent MR1 studies have also 
contributed to understanding the structural changes underlying AD cog- 
nitive impairment by demonstrating the association of cognitive difficul- 
ties with reductions in hippocampal volume (de Jong et al., 2008). 
Accordingly, the histopathological investigation of Braak and Braak 
(Braak and Braak, 1991 ) suggests that AD begins with the formation of 
neurofibrillary tangles in the MTL, particularly the entorhinal cortex 
(EC), a structure of the parahippocampal cortex, which then continues 
in the hippocampus (HC) and from there expands to other structures 
throughout the neocortex. Therefore, using EC and HC atrophy as early 
imaging biomarkers is considered a promising way of following the pro- 
gression of AD (Frisoni et al., 2010), especially since changes in these 
structures are closely related to modifications in the subject's cognitive 
performance. However, the automatic extraction of these MTL structures 
is challenging, especially in the case of the EC (Du et al., 2001 ). Moreover, 
the intersubject variability of brain anatomy tends to limit AD detection 
methods that use only volumetric approaches (Coupe et al., 2012a; Wolz 
et al., 2011b). These two aspects limit the capability of volume-based 
imaging biomarkers that use MTL structures to characterize the earliest 
stages of AD as well as to develop efficacious strategies for prevention 
or early intervention. 

Recently, we proposed new methods to address these issues: We 
developed a robust approach to automatically segment the HC and EC 
(Coupe et al., 201 1 ) and introduced a new scoring method to enable bet- 
ter characterization of structure atrophy (Coupe et al., 2012a). In the lat- 
ter work, scoring of the structure under consideration is achieved by 
estimating the nonlocal similarity of the subject to different training 
populations. Because it uses a nonlocal framework, our Scoring by 
Nonlocal Image Patch Estimator (SNIPE) addresses the problem of 
intersubject variability nicely by enabling a one-to-many mapping 
between the subject's anatomy and those of the training templates. 
Moreover, by employing the patch-based comparison principle, SNIPE 
detects subtle changes caused by the disease, as already shown in 
Coupe et al. (2012a). In this previous study, we demonstrated the high 
success rate of SNIPE at detecting AD (i.e., AD patients vs. cognitively nor- 
mal (CN) individuals) in a subset of the Alzheimer's Disease Neuroimag- 
ing Initiative (ADNI) database (i.e., 100 subjects). 

From a clinical perspective, the ability to predict AD (i.e., identifying 
progressive (pMCI) vs. stable MCI (sMCI)) is more crucial than being 
able to detect AD. However, prediction is clearly more challenging 
because (i) the anatomical changes to be identified are more subtle at 
the prodromal phase of the disease and (ii) the heterogeneous MCI 
group includes a mix of individuals, some who will convert to AD and 
others who will not. The distinction between the two is the crucial test 
for any proposed biomarker. Recently, several studies have compared 
the sensitivity and accuracy to differentiate between sMCI and pMCI of 
a number of structural imaging biomarkers such as HC volume, cortical 
thickness measurements (CTH), voxel-based methods using VBM fea- 
tures, and tensor-based methods using TBM features (Cho et al., 2012; 
Chupin et al., 2009; Cuingnet et al., 2011; Davatzikos et al., 2011; 
Koikkalainen et al., 2011; Misra et al., 2009; Querbes et al., 2009; 
Westman et al., 201 1 ; Wolz et al., 201 lb). In voxel-based methods, fea- 
tures similar to those involved in voxel-based morphometry (Ashburner 
and Friston, 2000) (i.e., the focal tissue probabilities) are used to achieve 
an individual patient's classification, sometimes after a step of dimension- 
ality reduction of the features (Kloppel et al., 2008; Vemuri et al., 2008). 
Similarly, individual classification can be also obtained using tensor- 
based morphometry features (Wolz et al., 2011b). Detailed reviews and 
comparisons of these imaging biomarkers can be found in Cuingnet et 
al. (2011) and Wolz et al. (2011b). According to these analyses, the 



accuracy of AD prediction of the usual methods (e.g., HC volume, CTH, 
VBM, or TBM) is less than 66% (Wolz et al., 2011b) when applied to the 
ADNI database. To the best of our knowledge, the highest prediction accu- 
racy obtained on all the baseline scans of the ADNI database (834 sub- 
jects) was achieved by combining the four methods, resulting in an 
accuracy of 68% for pMCI versus sMCI (Wolz et al., 2011b). 

In the current study, we investigate the capability of SNIPE to early 
detect AD using the entire ADNI database. We compare the obtained 
results with those of the different methods compared in Wolz et al. 
(2011b) by using the same cohorts and the same validation framework. 
Our analysis also includes results from a new sparse-based approach 
proposed in Liu et al. (2012). Finally, a presentation of the pathology pro- 
gression around the HC and EC is presented through a patch-based mor- 
phometry (PBM) analysis, as recently suggested in Coupe et al. (2012b). 

2. Materials and methods 

Data used in the preparation of this article were obtained from the 
ADNI database (adni.loni.ucla.edu). The ADNI was launched in 2003 
by the National Institute on Aging (NIA), the National Institute of Bio- 
medical Imaging and Bioengineering (NIBIB), the Food and Drug 
Administration (FDA), private pharmaceutical companies, and non- 
profit organizations as a $60 million, 5-year public-private partner- 
ship. The primary goal of the ADNI has been to test whether serial 
MRI, PET, other biological markers, and clinical and neuropsychologi- 
cal assessment can be combined to measure the progression of MCI 
and early AD. Determination of sensitive and specific markers of 
very early AD progression is intended to aid researchers and clinicians 
in developing new treatments and monitoring their effectiveness, as 
well as lessen the time and cost of clinical trials. 

2.t.MMdataset 

2.3 .3. ADNI dataset: 834 baseline scans 

The current study aims to investigate the capability of SNIPE to pro- 
duce early diagnosis of AD compared with recently proposed methods. 
In our experiment, the 834 baseline scans at 1.5 T of the ADNI database 
were used. The scans were divided into four populations, with an MCI 
subject considered progressive if he or she converted to AD as of July 
2011. This population construction resulted in the four groups compos- 
ing our dataset: 231 CN, 238 sMCI, 167 pMCI, and 198 AD. The four 
constructed cohorts are the same as those used in Wolz et al. (2011b), 
and the CN, AD, and pMCI cohorts are also the same cohorts as used in 
a recently published study that used the sparse-based method (Liu et 
al., 2012). Demographic details of the dataset can be found in Table 1. 

2.3.2. Preprocessing 

Before applying SNIPE, all the images were preprocessed through a 
fully automatic pipeline, which comprised the following steps: estima- 
tion of the standard deviation (SD) of Rician noise with (Coupe et al., 
2010); denoising based on an optimized nonlocal means filter (Coupe 
et al., 2008); correction of inhomogeneities using N3 (Sled et al., 
1998); registration to the stereotaxic space based on a linear transform 
to the ICBM1 52 template (lxlxl mm 3 voxel size) (Collins et al., 1 994) 
using a population-specific template derived from the ADNI database 
and constructed using the algorithm published in Fonov et al. (2011); 
linear intensity normalization of each subject on template intensity; 



Table 1 

Demographic details of the dataset used. 





Population size 


% Male 


Age±SD 


MMSE±SD 


CN 


231 


52% 


76.0 ±5.0 


29.1 ±0.9 


sMCI 


238 


67$ 


74.9 ±7.7 


27.2 ±2.5 


pMCI 


167 


60$ 


74.5 ±7.2 


26.4 ±2.0 


AD 


198 


50% 


75.6 ±7.7 


22.8 ±2.9 
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Fig. 1. Example of SNIPE workflow for an MCI subject. Once the label propagation step is finished, the resulting training libraries can be used by SNIPE to estimate the grading maps 
of the entire ADNI database (AD, pMCl, sMCI, and CN). In this study, SNIPE was applied following the procedure described in Coupe et al. (2012a, 2012b) (see Fig. 1). 

1. Template preselection: Preselection of the N/2 closest subjects from each training library (AD and CN populations) is achieved using the sum of the squared difference over the 
initialization mask. 

2. Scoring of the subject under study: For each voxel (included in the initialization mask) of the subject under study (pMCI in this example), we compared the surrounding patch 
with all the patches from the N training templates selected from the AD and CN populations. 

3. Feature extraction: The average grading value over the HC and EC segmentations is used as the relevant feature for the classification step. 

4. Classification: The final classification step is based on linear discriminant analysis using all the other subjects (AD and CN populations for AD or CN subjects, and pMCI and sMCI 
populations for MCI subjects). 



brain extraction using BEaST (Esldldsen et al., 2012); image crop around 
the structures of interest (see Fig. 1); and cross-normalization of the 
MRI intensity between the subjects using the method proposed in 
Nyul and Udupa (2000) within the estimated brain masl<. 

2.2. Scoring by Nonlocal Image Patch Estimator (SNIPE) 

Inspired by our work based on a nonlocal patch-based frameworl< for 
MRI denoising (Coupe et al., 2008) and for MRI segmentation (Coupe et 
al., 2011), we recently proposed a new method to estimate structure 
grading called SNIPE (Coupe et al., 2012a). The grading or scoring of 
the structure under consideration is achieved by estimating the nonlocal 
similarity of the subject under study to different training populations 
(see Fig. 1). With the nonlocal framework, SNIPE is able to handle 
intersubject variability by enabling a one-to-many mapping between 
the subject's anatomy and those of the training templates. Moreover, 
by employing the patch-based comparison principle, SNIPE can detect 
subtle anatomical changes caused by the disease (see (Coupe et al., 
2012a) for details). 

2.2.1. Label propagation 

The first step of the SNIPE method is to propagate a small number of 
manual segmentations over the entire training library. In this study, the 



AD and CN populations were used as the training library to achieve 
the scoring of the AD, CN, sMCI, and pMCI populations; therefore, label 
propagation was performed only on AD and CN subjects. As done in 
Coupe et al. (2012a), 20 scans were first randomly selected from the 
AD and CN populations (10 CN and 10 AD) for manual labeling. The 
HC and EC in these 20 scans were manually segmented by an expert 
using the protocol described in Pruessner et al. (2002). Then, the 
manual segmentations were used to segment the entire AD and CN 
populations, ensuring that no subject was used for its own segmen- 
tation. Finally, automatic segmentations were available for the 231 
CN subjects and 198 AD patients constituting our training library 
(see Fig. 1). 

2.2.2. Structure grading 

Once the label propagation step was finished, the resulting training 
library could be used by SNIPE to estimate the grading maps for the 
entire ADNI database (AD, pMCI, sMCI, and CN). SNIPE was applied 
according to the following procedure (see Fig. 1): 

1 ) Template selection : The selection of the N/2 closest subjects from 
each training population (i.e., AD and CN) is achieved using the 
sum of the squared difference (SSD) over an initialization mask. For 
the AD and CN subjects, we removed the subject under study from 
the training library. 
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2) Scoring of the subject under study : For each voxel (included in the 
initialization mask) of the subject under study (pMCI in the example 
provided in Fig. 1 ), we compared the surrounding patch with all the 
patches from the N training templates selected from the AD and CN 
populations. Thus, we simultaneously obtained a grading map and 
a segmentation for the HC and EC 

3) Feature extraction: The segmentations were used to compute the 
structure volumes, and the average grading value was estimated 
over the HC and EC segmentations. Both biomarkers were used as 
features in the classification step. 

2.2.3. Classification 

The classification step is based on linear discriminant analysis (LDA). 
In Coupe et al. (2012a), we showed that slightly better classification 
accuracy could be obtained for AD vs. CN using quadratic discriminant 
analysis (QDA); however, to enable comparison with recently published 
results based on linear classification techniques (Cuingnet et al., 2011; 
Wolz et al., 2011b), we used LDA in this study. Moreover, in Coupe et 
al. (2012a), we demonstrated that better classification accuracy could 
be achieved by using subject age as a feature in addition to volume or 
grade. Therefore, all the presented results for grade and volume 
biomarkers were obtained using the ages of the subjects as an additional 
feature in LDA. The correlation between the imaging biomarkers used 
and subject age will be also studied here. 

2.3. Validation framework design 

In our validation, we tried to minimize the impact of bias during 
feature extraction and feature classification. The design of this type 
of validation is challenging because of both the many possible sources 
of bias and the trade-off between bias and variance. 

2.3.1. Feature extraction 

The first source of bias may occur when a sample is involved in its own 
classification. This type of bias, known as "double-dipping" (Kriegeskorte 
et al., 2009), is sometimes difficult to avoid. With methods requiring the 
extraction of regions of interest (ROIs) where the populations differ the 
most (e.g., VBM, TBM, CTH), this type of bias occurs often and plays a 
role in recent studies carried out on the ADNI database (Koikkalainen 
et al., 201 1 ; Querbes et al., 2009; Wolz et al., 201 lb). We recently showed 
that double-dipping leads to a significantly overestimated detection and 
prediction accuracy (Eskildsen et al., in press). 

To avoid the double-dipping bias, authors usually use strategies 
based on splitting populations into training and testing folds. For 
instance, in Cuingnet et al. (201 1 ), the studied dataset is separated into 
two subsets of similar sizes for VBM and CTH approaches. This technique 
allowed ROIs to be estimated on the training dataset and applied to the 
test dataset. However, as we will show later and as discussed in Wolz 
et al. (2011b), this type of removal of the double-dipping bias in feature 
extraction occurs at the expense of a drastic increase in variability of the 
estimated success rates during feature classification. 

In our study, we avoid this type of bias during ROI estimation since 
our ROIs are obtained by structure segmentation at the same time as 
grading estimation. In our validation framework, the template selection 
is achieved by removing the current subject from the library. For a 
given subject, the N closest training templates were selected from all 
the remaining subjects in the training library. Then, the segmentation 
and grading were obtained using these N selected training templates. 
This technique ensures that a given subject is not included in the training 
library used by SNIPE for its own processing. The absence of double- 
dipping is implicit for MCI subjects since we used the AD and CN 
populations as training templates. 

2.3.2. Classification 

Once all the subjects were processed using SNIPE, the final step 
consisted in subject classification based on the extracted features, 



namely, volumes and grades. At this point, different possibilities were 
available to perform the cross-validation (CV), several of which have 
been recently used on the ADNI database. 

- Controlled 50% vs. 50%: In Cuingnet et al. (201 1 ), the authors used 
the 50% vs. 50% procedure, randomly splitting each population 
into two subsets (one training and one testing) with similar prop- 
erties for age and gender attributes. This method should ensure 
the absence of bias during classification, but as discussed in Wolz 
et al. (2011b) and later in this paper, this validation procedure 
results in high variance of the distribution of success rates 
according to the random population splitting. 

- Repeated LNOCV: To moderate the high variance of the obtained 
success rates, Wolz et al. (2011b) proposed to use a repeated 
leave-N-out cross-validation (LNOCV) method. They used 95% of 
the datasets as the training set and the remaining 5% as the testing 
set (randomly selected). To reduce the variance of the results, they 
repeated this procedure 100 times and used the mean classification 
rate as the final result. This method requires 100x20 classifications. 

- Stratified fc-fold: More recently, Liu et al. (2012) proposed to use a 
stratified 10-fold CV procedure. The dataset is first split into 10 sub- 
sets of similar sizes, while preserving the label proportion of the 
original dataset. Then, in turn, each fold is used as the test set, and 
the nine remaining folds, as the training set. 

- LOOCV: In Coupe et al. (2012a), we used a leave-one-out cross- 
validation (LOOCV) procedure. In this type of CV, the classifier is 
trained on n— 1 samples and then used to classify the remaining 
samples. This type of approach can be computationally expensive 
depending on n, the number of subjects in the dataset. 

To evaluate which method is best suited to perform the CV of the ADNI 
dataset, we compared the previously described approaches. Fig. 2 shows 
the comparison of CV procedures for AD vs. CN using HC volumes in 
terms of success rate; controlled 50% vs. 50%, lOOx LNOCV, stratified 
10-fold, and LOOCV were compared using an LDA classifier over 1,000 
realizations. The mean success rates were 78.7%, 78.9%, 79.0%, and 79.1% 
respectively, and the median success rates were 78.9%, 78.9%, 78.9%, and 
79.1% respectively. Although both the mean and median success rate 
values were 79% for all compared validation procedures (for LOOCV, 
there is only one deterministic value), high variations were observed for 

Comparison of cross-validation procedures 
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Fig. 2. Comparison of cross-validation (CV) procedure for AD vs. CN using hippocampal 
volumes and subjects' ages in terms of success rate. The 50% vs. 50% CV, lOOx 
leave-N-out CV, 10-fold CV, and leave-one-out CV were compared using LDA over 
1000 realizations. The mean success rates were 78.7%. 78.9%, 79.0%, and 79.1% respec- 
tively. The median success rates were 78.9%, 78.9%, 78.9%, and 79.1% respectively. 
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the 50% vs. 50% and lOOx LNOCV procedures, which led to maximum 
values of 84% and 82% respectively. This high variation in success rates 
makes it difficult to compare methods because the published results 
may be derived from the median or from the extreme limits of the distri- 
bution. Interestingly, the value provided by LOOCV is similar to the medi- 
an values of the distribution obtained with other validation procedures. 

In practice, alternative validation procedures are used in place of 
LOOCV for computational reasons when a large number of samples are 
involved. In the case of the ADNI dataset, the LOOCV required less than 
2 seconds and was faster than the lOOx LNOCV. Moreover, LOOCV is 
known to be an almost unbiased estimator (Cawley and Talbot, 2004). 
Therefore, we decided to use LOOCV in our validation, since the value 
obtained with LOOCV corresponds to the median value of the distribu- 
tions obtained with other CV procedures, without any possible variations 
in published results according to the random sampling. The maximum 
values obtained by using lOOx LNOCV and 10-fold CV are presented 
only for the comparison with previously published work in order to 
provide the median (i.e., LOOCV) and the upper limit of the success 
rate distributions for a fairer comparison. 

2.4. Implementation details 

In this study, we used all the parameters proposed in Coupe et al. 
(2012a), except for the patch size for EC and the number of training tem- 
plates used, N. Recently, we showed in Hu et al. (in press) that a patch of 
5x5x5 voxels is sufficient for EC segmentation and is thus used for com- 
putational reasons. Here, we used this patch size for EC and patches of 
7x7x7 voxels for HC, as suggested in Coupe et al. (2012a) and Coupe 
et al. (2011). In Coupe et al. (2012a), we also suggested that 60% of the 
entire library be selected during template selection (i.e., 30 AD and 30 
CN on the 50 available). In this study, we used only around 25% of the en- 
tire library (Nad = 50 and N C n = 50) for computational reasons. Details 
on all other parameters can be found in Coupe et al. (2012a)). 

3. Results 

3.3. SNIPE volumetric study 

Fig. 3 shows the volumes obtained by SNIPE for HC and EC. Volumes 
are plotted according to subject age for the four studied populations, and 
the distributions are presented as boxplots. We can observe a reduction 
in the volumes with age for HC, whereas for EC, this reduction is not sta- 
tistically significant as assessed by p-values and Pearson's coefficients. 
For HC, a greater reduction can be noted for the AD population, a finding 
that can be explained by the addition of age-related atrophy to that relat- 
ed to the pathology. The means of the HC volume distributions are signif- 
icantly different according to a multi-comparison test, and the expected 
order is observed (AD<pMCI<sMCI<CN). The change in evolution of EC 
volumes with age is more difficult to interpret. The low Pearson's coeffi- 
cient r and the high p-values of the linear regressions indicate a nonsig- 
nificant linear correlation between EC volumes and age, except for in the 
AD population. Compared with the results for the HC volumes, this find- 
ing might be due to higher intersubject variability and more frequent 
errors in the segmentation, as discussed in Coupe et al. (2012a). For EC 
then, the pathology-related patterns seem partially obscured by the 
intersubject variability. However, except for AD vs. pMCI, the means of 
EC volume distributions are significantly different according to a multi- 
comparison test at 95% confidence. Finally, a larger mean difference is 
observed between sMCI and CN volume distributions than between AD 
and pMCI (especially for EC volumes). 

3.2. SNIPE grading study 

Fig. 4 presents the average grading values obtained by SNIPE for HC 
and EC. For the studied structures, the grading values are significantly 
correlated with age (all p-values are < 0.05) and decrease with age. 



Moreover, this correlation holds when controlling for MMSE. In compar- 
ison with those obtained in the volumetric study, the correlation coeffi- 
cients obtained for grading are higher. As expected, CN subjects have 
the highest grading values, and AD patients, the lowest. Interestingly, 
the same observation holds for sMCI compared with pMCI. In all the 
studied cases, the means of the grading distributions of the studied 
populations were significantly different. The HC-grade distributions 
present lower variances and smaller overlap between populations 
compared with EC-grade distributions. In addition, the boxplots of 
grade distributions also show fewer outliers (red cross) and a smaller 
overlap between distributions compared with volume distributions. Fi- 
nally, as we show later in the classification experiment by comparing 
volume and grade biomarkers, the higher correlation with age enables 
a better distinction between anatomical differences due to age-related 
modifications and those due to pathology-related alterations, and the 
lower intrapopulation variance enables a better distinction between 
anatomical differences due to intersubject variability and those due to 
pathology-related alterations. 

Visual assessment of the changes in the grading maps with age 
between populations is proposed in Fig. 5. The estimated scoring is visu- 
ally lower for AD than for CN. This tendency can also be observed 
between sMCI and pMCI populations, and a global decrease in grading 
values with age is visible for the four studied populations. The increased 
atrophy of HC in the oldest subjects is also visible, especially for pMCI and 
AD subjects aged 80 to 90 years, in whom the combination of age-related 
and pathology-related atrophy yields significant HC reduction. 

3.3. Comparison of SNIPE-based biomarkers 

Table 2 presents the classification success rates obtained by the im- 
aging biomarkers under consideration for AD vs. CN, pMCI vs. CN, AD 
vs. sMCI, and pMCI vs. sMCI. These results show that (i) grading-based 
biomarkers outperform volume-based biomarkers ( + 5% to + 13%) and 
(ii) EC-based biomarkers are less efficient than HC-based biomarkers, 
except for AD vs. sMCI where both structures provided similar accuracy. 
Finally, the combination of volume and grade did not really change 
results from those obtained with the use of grade only. As assessed by 
p-values of McNemar test (McNemar, 1947) in Table 2, all the 
SNIPE-based biomarkers performed significantly better (i.e., p<0.05) 
than random classification for all the population comparisons consid- 
ered. In addition, in order to estimate if the difference between the clas- 
sification accuracy of biomarkers was significant, we compared the 
classification results of HC and EC, and of grading and structure volumes 
in Table 3. By using a confidence interval at 95%, all the biomarkers have 
significantly different accuracy, except HC-grade > EC-grade for AD vs. 
sMCI and pMCI vs. sMCI, and HC-vol > EC-vol for pMCI vs. sMCI. 

As expected, classification accuracies decrease when populations 
with closer pathological status were used (c.f, Table 2). Thus, the lowest 
accuracy was obtained for the pMCI vs. sMCI comparison. Although we 
expected similar results for pMCI vs. CN and AD vs. sMCI, we found an 
important difference in the classification accuracies of these two 
comparisons. With SNIPE, a clear difference between the pMCI and CN 
populations was detected, whereas a less distinctive one was found for 
AD and sMCI. These classification results seem to show that (i) the 
pMCI population is relatively similar to the AD population, indicating 
that the pMCI population studied was advanced in pathology progres- 
sion and close to conversion, and (ii) the important difference between 
CN and sMCI may result from anatomical modifications of the HC and 
EC in these two groups that may be related to the cognitive impairment. 
Alternatively, it could point to heterogeneity in the sMCI group where 
some subjects might still convert to pMCI and AD, but not have yet to 
do so. These subjects may share morphological characteristics with the 
pMCI group. To investigate these two possibilities further, we analyzed 
the classification results for AD vs. pMCI and sMCI vs. CN. As shown in 
Table 4, the detected difference for sMCI vs. CN is clearly greater than 
that for AD vs. pMCI: the classification of AD vs. pMCI using structure 
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Fig. 3. SNIPE-based volumetric study. Left: Volume of HC and EC structures for studied populations according to subject age. Linear regressions are displayed for better visualization 
of global tendencies. Pearson's coefficients and p-values of the regressions are provided in the legend. Right: Boxplots of the distributions. Colored stars above the boxplots indicate 
a significantly different mean from those of other groups, obtained using a multi-comparison test at 95% confidence. 



volumes provided results not significantly different to random classifica- 
tion since all p-value are greater than 0.05 while for sMCI vs. CN we 
obtained a significant difference for these biomarkers. 

For AD vs. CN, our results are in line with the study presented in 
Coupe et al. (2012a) on 100 baseline scans using QDA, although they 
were slightly lower for HC and better for EC. The improvement in EC 
grading might be due to the larger training library used here, which en- 
ables a better representation of EC intersubject variability. For AD vs. 
sMCI, the efficiency of HC grading classification accuracy drops to the 
level of EC grading (as assessed by p-value in Table 3) and is closer to 
the accuracy observed for the pMCI vs. sMCI comparison than that for 
the pMCI vs. CN comparison. For the AD vs. sMCI comparison, HC grade 
and EC grade seem to be key biomarkers to differentiate between AD 
and sMCI, whereas for the other population comparisons, HC grade is sig- 
nificantly more efficient (see Table 3). This observation is also confirmed 



by the results obtained for AD vs. pMCI ( see Table 4) where EC grade pro- 
vided better results than HC grade. This finding may be related to the fact 
that atrophy of the EC seems to be specific to the pathological processes 
associated with AD and pMCI, while a linear decrease of HC volume with 
age has been observed in healthy populations for men starting in the 
third decade of life, and for women, after menopause (Pruessner et al., 
2001 ). Therefore, for AD vs. sMCI, the advantage of using HC-EC complex 
grading compared with HC grading is the greatest ( + 4% while around 
± 1% for other comparisons, see Table 2). As shown in Coupe et al. 
(2012a), for AD vs. CN, the combination of HC and EC grade tends to 
slightly improve classification accuracy. In this study, however, such 
was not the case for pMCI vs. sMCI. This result was unexpected given 
that the EC is believed to be affected before the HC in the evolution of 
the pathology (Frisoni et al., 2010) and thus should be more useful for di- 
agnosis at the early stages of the disease. As previously pointed out, the 
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Fig. 4. SNIPE-based grading study. Left: Grade of HC and EC structures for studied populations according to subject age. Linear regressions are displayed for better visualization of 
global tendencies. Pearson's coefficients and p-values of the regressions are provided in the legend. Right: Boxplots of the distributions. Colored stars above the boxplots indicate a 
significantly different mean from those for other groups, obtained using a multi-comparison test at 95% confidence. 



difficulties related to EC classification (high intersubject variability in 
shape and size of EC) seem to adversely affect the usefulness of this bio- 
marker for early detection of AD-related pathology. 

3.4. Comparison with previous work 

Recently, several studies provided extensive comparisons of well- 
known methods such as methods based on VBM features, methods 
based on TBM features, CTH, and HC volume applied to the ADNI data- 
base (Cuingnet et al., 2011; Wolz et al., 2011b). As a result, estimations 
of the classification accuracy of different imaging biomarkers can be 
compared on the same large database. To the best of our knowledge, 
the study proposed by Wolz et al. (2011b) is currently the most 



comprehensive work performed on the ADNI database: they used all 
834 baseline scans in the ADNI database, studied different scenarios 
(AD vs. CN, pMCI vs. CN and pMCI vs. sMCI), and they also showed that 
their method obtained better results than all the methods compared by 
Cuingnet et al. (2011) (i.e., HC volume, VBM, CTH, and HC shape) on a 
smaller dataset. Therefore, we chose to compare SNIPE with the results 
presented in Wolz et al. (201 lb) since they represent the best published 
results for pMCI vs. sMCI, the differentiation of which is the main chal- 
lenge from a clinical perspective. 

We also compared SNIPE with very recent work on sparse 
representation-based classifiers (SRC) applied to gray matter 
(CM) and validated in the same AD, pMCI and CN populations (Liu 
et al., 2012). This SRC approach and SNIPE are based on similar 
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philosophies in that both approaches analyze anatomical similari- 
ties using patch comparisons between populations. However, sev- 
eral differences can be pointed out. 

- First, SNIPE uses nonlocal redundancy of information, whereas (Liu et 
al., 2012) uses local sparsity. The nonlocal/local aspect impacts the 
anatomy matching of subjects, which in Liu et al. (2012) is achieved 
by one-to-one mapping after nonlinear registration, whereas SNIPE 
performs one-to-many mappings after linear registration. The redun- 
dancy /sparsity aspect differs in how patches are compared. With 
redundancy, we try to use the largest possible number of patches to 
take advantage of the repetition of useful information, thus making 
a decision based on as much input as possible and minimizing poten- 
tial errors. By contrast, sparsity aims to find the smallest subset of the 
most relevant patches. 

- Second, SNIPE focuses on key structures such as the HC and EC, while 
(Liu et al., 2012) compared the entire GM area. In Liu et al. (2012), a 
preselection of the ROIs within GM areas is achieved by extracting 
the most significantly different areas between populations, similarly 
to what is classically done for CTH. 

Tables 5 and 6 show the results of the method comparison between 
SNIPE using CV procedures proposed in the other two studies. 

For AD vs. CN, the results obtained with SNIPE were similar to those 
from the combination of four methods reported in Wolz et al. (201 lb) 
(91% compared to 89% using lOOx LNOCV, see Table 5). SNIPE 
obtained better results than HC volume (Lotjonen et al., 2011), 
manifold-based learning (Wolz et al., 2011a), CTH (Lerch and Evans, 
2005), and method based on TBM features (Koikkalainen et al., 
2011), although the results from multi-template TBM and SNIPE 
were close, as were those from SNIPE and patch-based SRC (Liu 
et al., 2012) (90% compared to 91% using fc-fold CV, see Table 6). 
The results obtained for HC volumes using patch-based segmentation 
(Coupe et al., 2011) and multi-template nonlinear warping (Lotjonen 
et al., 2011) were also close (83% compared to 81% using lOOx 
LNOCV, see Table 5). These findings seem to indicate that the compared 



approaches provide similar segmentation accuracies. Interestingly, HC 
grade provided results that were similar to or better than those from 
methods analyzing the entire brain anatomy (i.e., method based on 
TBM features, global SVM/SRC, and advanced method based on VBM 
features such as COMPARE (Fan et al., 2007)) and requiring nonlinear 
registration of all subjects. 

For pMCl vs. CN, the results obtained by SNIPE were similar to 
those from patch-based SRC (87% compared to 88% using fc-fold CV, 
see Table 6) but better than those from all the methods compared 
in Wolz et al. (2011b) as well as their combination (88% compared 
to 84% using lOOx LNOCV, see Table 5). This finding seems to indicate 
that new patch-based frameworks perform better than classical 
methods such as HC volume or methods based on TBM features. In 
addition, preselecting the most relevant GM areas or using segmenta- 
tion of key structures seems to lead to similar classification accuracy. 
The latter has the advantage of directly avoiding double-dipping. 

For pMCl vs. sMCI, the results obtained by SNIPE were better than 
those from all the methods compared in Wolz et al. (2011b) (74% 
compared to 68% using 100 x LNOCV, see Table 5). This outcome high- 
lights the potential of SNIPE for AD prediction by enabling the detec- 
tion of subtle anatomical changes caused by AD at the early stages of 
the pathology. Unfortunately, Liu et al. (2012) did not provide results 
for this comparison, and thus no comparison between efficiency of 
redundancy and sparsity can be done for early detection. 

3.5. Patch-based morphometry analysis 

Another important aspect of a method is its potential to visualize 
the differences between populations in a compact way. This capabili- 
ty is one explanation for the great success of the VBM, CTH, and TBM 
methods. In their discussion, Liu et al. (2012) warn that the main 
limitation of their method is the impossibility of visualizing the spa- 
tial location of the most discriminant areas between populations. 
They conclude that this limitation results in less clinical insight and 
thus a lower understanding of the pathology mechanisms. 
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Table 2 

Classification results obtained with different biomarkers for AD vs. CN, pMCI vs. CN, 
and pMCI vs. sMCI. Results were obtained using linear discriminant analysis through 
a leave-one-out cross-validation procedure. The values presented in the table corre- 
spond to the classification accuracy (acc) in %, the sensitivity (sen) in %, the specificity 
(spe) in % and the p-value of the McNemar test to assess the performance of classifica- 
tion compared to random classification. For each comparison (e.g., pMCI vs. CN), the 
best result is in bold and underline. 



AD vs. CN 


HC 


EC 


HC-EC 




acc% / sen%/ 


acc% / sei«/ 


acc% / sen%/ 




spe% (p) 


spe% (p) 


spe% (p) 


Volume 


79 / 76 / 82 


70 / 68 / 72 


78 / 76 / 80 




(p<0.0001) 


(p< 0.0001) 


(p<0.0001) 


Grade 


88 / 83 / 92 


83 / 75 / 90 


89 / 84 / 93 




(p<0.0001) 


(p< 0.0001) 


(p<0.0001) 


Volume + Grade 


87 /83 /91 


83 / 74 / 91 


88 / 84 / 92 




(p<0.0001) 


(p< 0.0001) 


(p<0.0001) 


pMCI vs. CN 


HC 


EC 


HC-EC 


Volume 


75 / 73 / 76 


69 / 66 / 71 


75 / 74 / 75 




(p<0.0001) 


(p< 0.0001) 


(p<0.0001) 


Grade 


85 / 80 / 88 


79 / 73 / 83 


86 / 80 / 89 




(p<0.0001) 


(p<0.0001) 


(p<0.0001) 


Volume + Grade 


85 / 80 / 88 


80 / 73 / 85 


85 / 80 / 88 




(p<0.0001) 


(p< 0.0001) 


(p<0.0001) 



Table 4 

Classification accuracy obtained for AD vs. pMCI and sMCI vs. CN. Results were obtained 
using linear discriminant analysis through a leave-one-out cross-validation procedure. 
The presented results are the classification accuracy (acc) in %, the sensitivity (sen) in 
%, the specificity (spe) in % and the p-value of the McNemar test to assess the perfor- 
mance of classification compared to random classification. For each comparison the 
best result is in bold and underline. 



AD vs. pMCI HC 

acc% / sen%/ spe% 
(P) 



EC 

acc% / sen%/ spe% 
(?) 



HC-EC 

acc% / sen%/ spe% 
(P) 



Volume 



Grade 



Volume + Grade 



56 / 51 / 59 
(p = 0.163) 
58 / 57 / 60 
(p = 0.032) 
58 / 57 / 59 
(p = 0.039) 



51 /48 /54 
(p = 0.852) 
62 / 63 / 60 
(p = 0.002) 
61 / 63 / 59 
(p = 0.004) 



55/51 /58 
(p = 0.243) 
60 / 60 / 59 
(p = 0.012) 
60 / 61 / 59 
(p = 0.008) 



sMCI vs. CN 

Volume 



Grade 



Volume - 



HC 

63 / 65 / 62 
(p<0.0001) 
69 / 74 / 63 
(p<0.0001) 
Grade 69 / 76 / 62 
(p<0.0001) 



EC 

60 / 65 / 55 
(p = 0.003) 

63 / 68 / 58 
(p<0.0001) 

64 / 72 / 55 
(p<0.0001) 



HC-EC 

64 / 65 / 63 
(p<0.0001) 

68 / 76 / 60 
(p<0.0001) 

69 / 76 / 63 
(p<0.0001) 



ADvs.sMCI HC EC HC-EC 

Volume 68/67 /70 62/57/66 69/67/70 

(p<0.0001) (p = 0.0008) (p<0.0001) 

Grade 73 / 71 / 75 72 / 69 / 74 77 / 77 / 78 

(p<0.0001) (p<0.0001) (p<0.0001) 

Volume + Grade 73 /71 /75 73 /70/75 77 / 77 / 77 

(p<0.0001) (p<0.0001) (p<0.0001) 



pMCI vs. sMCI HC EC HC-EC 

Volume 62 / 61 / 63 59 / 59 / 59 63 / 63 / 64 

(p = 0.0007) (p = 0.018) (p = 0.0003) 

Grade 71 / 70 / 71 66/62/68 70/69/71 

(p<0.0001) (p<0.0001) (p<0.0001) 

Volume + Grade 71 / 70 / 72 65 / 60 / 68 70 / 71 /69 

(p<0.0001) (p<0.0001) (p<0.0001) 



voxel-by-voxel, for each population using the nonlinearly warped 
maps. This way, the spatial distribution of grading values was 
obtained for each population studied to enable a compact visualiza- 
tion of population differences. 

Fig. 6 shows the mean grading maps obtained for CN, sMCI, pMCI, 
and AD populations. A clear difference can be observed between each 
of the populations, especially at the HC level. At the global level, the 
PBM results indicate that the posterior part of the HC seems to be 
the location of major differences between sMCI and pMCI while the 
main difference detected between AD and CN seems to be observed 
at the body and head level of the HC (i.e., anterior part). In addition, 
the right HC seems to be more discriminant between CN and sMCI, 
while the left HC shows a greater difference between pMCI and AD. 
This might indicate that the right HC is first impacted by AD pathology. 



Recently, we proposed a new patch-based morphometry (PBM) 
method based on SNIPE to study anatomical differences between AD 
and CN in the entire brain (Coupe et al., 2012b). Instead of comparing 
tissue probability as done in voxel-based morphometry, PESM com- 
pares grading maps. Therefore the comparison between populations 
is based on the score assigned to a voxel according to the similarity 
of its surrounding patch with the patch libraries derived from both 
populations. Here, we propose a similar approach but for studying 
the typical spatial distribution of grade for each population over the 
entire ADNI database. First, the grading maps were warped to our 
population-specific template derived from the ADNI database and 
constructed using the algorithm published in Fonov et al. (2011) 
with ANIMAL non-linear registration (Collins and Evans, 1997). To 
do that, each subject's Tlw MRI was nonlinearly registered onto our 
template. The resulting transformation was then applied to the 
subject's grading maps. Finally, a mean grading map was estimated, 



Table 3 

Comparison of the classification performance of the different SNIPE-based biomarkers. 
A McNemar test was used to compare the classification accuracy of EC-based and 
HC-based biomarkers, and to compare the grading-based and volume-based bio- 
markers for different populations. 





HC vol > EC 


HCgrad>EC 


HCgrad>HC 


EC grad > EC 




vol 


grad 


vol 


vol 


AD vs. CN 


p = 0.0004 


p = 0.0250 


p<0.0001 


p<0.0001 


pMCI vs. CN 


p = 0.0312 


p = 0.0081 


p<0.0001 


p = 0.0004 


AD vs. sMCI 


p = 0.0274 


p = 0.6135 


p = 0.0360 


p = 0.0003 


pMCI vs. sMCI 


p = 0.2685 


p = 0.0648 


p = 0.001 9 


p = 0.0221 



4. Discussion 

In this study, we showed that SNIPE-based grading biomarkers pro- 
vided competitive results for early detection of AD compared with con- 
ventional methods such as HC volume, CTH, and method based on TBM 
features. We also found that new patch-based paradigms (nonlocal 
redundancy and local sparsity) are promising ways of detecting subtle 
anatomical changes between populations. Further investigations into 
these new approaches are still required to determine the best direction 



Table 5 

Comparison of classification results between SNIPE and methods studied in Wolz et al. 
(2011b). Results shown are the best results obtained using lOOx LNOCV. The 
presented results are the classification accuracy (acc) in %, the sensitivity (sen) in % 
and the specificity (spe) in %. Best result for each comparison is in bold and underline. 



lOOx LNOCV 



AD vs. CN pMCI vs. CN pMCI vs. sMCI 

acc%/sen%/ acc%/sen%/ acc%/sen%/ 
spe% spe% spe% 



SNIPE 

• HC Volume 83 /80 /85 78 / 77 /78 66 /65 /67 

• HC Grade 90/86/93 87 /83 /90 74 / 73 / 74 

• HC-EC Volume 80/80/81 78/78 /77 67/66/68 

• HC-EC Grade 91 / 87 / 94 88 / 83 / 91 73 /72/74 
Multi-Method (Wolz et al., 2011b) 

• HC Volume 81 /81 /79 76 / 77 /76 65 /63 /67 

• Manifold-based learning 85 / 87 / 83 78 / 81 / 75 65 / 64 / 66 

• Cortical thickness 81 /89 /71 77 / 85 /65 56 /63 /45 

• Tensor-based method 87 / 90 / 84 79 / 82 / 76 64 / 65 / 62 

• All 89/93/85 84/86/82 68/67/69 
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Table 6 

Comparison of classification results between SNIPE and methods studied in Liu et al. 
(2012). Results shown are the best results obtained using 10-fold CV. The presented 
results are the classification accuracy in %, the sensitivity in % and the specificity in %. 
Best result for each comparison is in bold and underline. 



10-Fold CV 



AD vs. CN pMClvs.CN pMCI vs. sMCI 

acc%/sen%/ acc%/sen%/ acc%/sen%/ 



SNIPE 

• HC Volume 

• HC Grade 

• HC-EC Volume 

• HC-EC Grade 



83/80/86 80/79/80 66/67 /65 

90 / 85 / 94 87 / 85 / 89 71 / 70 / 71 

83/82/84 80/78/81 68/64/71 

90 / 85 / 94 87 / 83 / 90 73 / 68 / 76 



Sparse Classification (Liu et al., 2012) 

• COMPARE 81 /79/83 - 

• Global SVM 85 / 73 / 95 81 / 73 / 90 

• Global SRC 88 / 81 / 94 85 / 83 / 87 

• Patch-based SVM 86/75/94 82/74/91 

• Patch-based SRC 91/86/95 88/85/90 



for future study. First, the scale of analysis needs intensive study (i.e., key 
structures vs. whole brain). In future work, we hope to analyze the grad- 
ing of the whole GM area in order to shed some light on this point. In ad- 
dition, the optimal way of comparing patches (i.e., redundancy vs. 
sparsity) should be more carefully studied by using a similar framework 
for training library construction (i.e., local vs. nonlocal). In recent 
denoising literature (Mairal et al., 2009; Manjon et al., 2012), sparsity- 
based filters seem to provide slightly better results than nonlocal 
means filters. We believe that a nonlocal sparsity approach may be a 
promising way of achieving this type of scoring, as the well-defined 
one-to-many correspondence would be coupled with the efficiency of 
a sparse-based approach. 

We also discussed the issue of the cross-validation procedure, 
highlighting that LOOCV is a good option because the published results 
can be compared without any variation due to the random splitting of 



populations. Our experiment showed that, for the ADNI database, 
LOOCV provided an estimate similar to the mean/median of the com- 
pared CV. Therefore; we used an LOOCV procedure for the comparison 
of SNIPE-based biomarkers. The discussion on bias during validation 
complements our recent discussion on double-dipping issues presented 
in Eskildsen et al. (in press). Both the variation in success rates due to CV 
and the overestimation of success rates rate due to double-dipping 
should be considered in future studies in order to limit their impact on 
published results. 

The comparison of SNIPE-based biomarkers in the context of early de- 
tection demonstrated the high potential of the proposed framework for 
this key clinical problem. Although the prediction rate obtained (71% 
with LOOCV, 73% with 10-fold CV and 74% with lOOx LNOCV) is not yet 
suitable for clinical use, the recent progress of MRI-based biomarkers on 
this challenging classification problem is encouraging. In fact, still very re- 
cently, the highest success rate was only around 56% on the ADNI data- 
base (Davatzikos et al., 2011) using advanced VBM-like analysis such as 
Spatial Pattern of Abnormalities for Recognition of Early AD (SPARSE-AD) 
(Misra et al., 2009). It is also encouraging to note that the improvements 
brought by SNIPE were not obtained at the expense of method or compu- 
tational complexity. SNIPE requires only linear registration and can be 
implemented easily. In addition, its computational time is around 5 mi- 
nutes per subject using CPU implementation, and this time can be further 
reduced by using GPU implementations, as already proposed for real-time 
processing in denoising literature (Palhano Xavier de Fontes et al., 201 1 ). 
In the case where the computational cost is not a limiting factor, variants 
of SNIPE based on nonlinear registration might be used by involving local 
or semi-local label fusion methods (Sabuncu et al., 2010; Wang et al., 
201 1 ) after nonlinear registration of all the subjects. This would result 
in a method similar to the sparse-based method (Liu et al., 2012) men- 
tioned in this paper. The combination of nonlocal patch-based method 
with nonlinear registration has been recently proposed for segmentation 
(Fonov et al., 2012). Finally, the simplicity of the SNIPE framework results 
in a robust pipeline; the processing failure rate was less than 1.7% at the 




Fig. 6. Mean grading map for each population overlaid on our population-specific template derived from the subset of the ADNI database. These mean grading maps were obtained 
by first nonlinearly registering all the grading maps of the ADNI database to our population-specific template. Then, the warped grading maps were averaged according to the pop- 
ulation. The grading values are displayed with the same range [ — 0.15, 0.15] for the four populations. The values above 0.15 are set display in white and the values under —0.15 are 
displayed in black. 
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linear registration step— a much lower failure rate in great contrast to the 
13% obtained for the CTH method presented in Wolz et al. (2011b). 

The last part of this study was dedicated to the analysis of pathology 
progression using patch-based morphometry (PBM) (Coupe et al., 
2012b). With this new approach, we were able to present the mean 
grading map for each population. Global PBM results seem to indicate 
that the anterior part of the HC (i.e., head and anterior body) is the 
more discriminant area between AD and CN populations. More interest- 
ingly, the first alterations of the HC seem to be located in the posterior 
part (i.e., tail and posterior body). In further work, our PBM results 
should be analyzed using HC subfields atlas as already done in literature 
using HC shape analysis (Apostolova et al., 2006; Frisoni et al., 2008; 
Gerardin et al., 2009) or volumetric approaches (Atienza et al., 2011; 
Hanseeuw et al., 2011; Mueller et al., 2010). This type of HC subfields 
analysis should enable a comparison of our findings with current knowl- 
edge about AD progression derived from histological studies (Lace et al., 
2009; Schonheit et al., 2004). 

5. Conclusion 

This study analyzed the capability of SNIPE to perform early detection 
of AD. The experiments were carried out on the entire ADNI database 
(834 subjects). A comparison with recent methods proposed for the cru- 
cial problem of AD prediction highlights the competitive results obtained 
by SNIPE-based biomarkers. In addition, the first results of patch-based 
morphometry analysis were presented as a new way of studying pathol- 
ogy progression. Finally, a discussion was provided on the promising 
results proposed by new patch-based frameworks based on redundancy 
and sparsity. 
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